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Abstract. In this paper we explain the strikingly regular activity of the 'grid' cells in rodent dorsal medial 
entorhinal cortex (dMEC) and the spatially localized activity of the hippocampal place cells in CAS and 
CAl by assuming that the hippocampal region is constructed to support an internal dynamical model of the 
sensory information. The functioning of the different areas of the hippocampal-entorhinal loop and their 
interaction are derived from a set of information theoretical principles. We demonstrate through simple 
transformations of the stimulus representations that the double form of space representation (i.e. place 
field and regular grid tiling) can be seen as a computational 'by-product' of the circuit. In contrast to 
other theoretical or computational models we can also explain how place and grid activity may emerge at 
the respective areas simultaneously. In accord with recent views, our results point toward a close relation 
between the formation of episodic memory and spatial navigation. 



1. INTRODUCTION 

When we enter a new place, even without having immediately recognized each of the objects surrounding 
us, we need only a moment to perceive the particular configuration of these objects within their environment 
(the mapping) and define our own relative position (localization or egocentric description) in the same 
environment. Sizing up distances is not of great difficulty either. In doing so we can use approximate learnt 
metric or intrinsic, idiothetic (self-motion based) cues, e.g., the number of steps needed to reach the wall. 
Why does spatial navigation, i.e., mapping, localization and remembering places seem so easy for animals, 
whereas it still constitutes a major challenge in robotics? What are the underlying computations that provide 
us with a metric required to gain not only topological, but also geometrical perception of our environment? 
An explanation of the surprising discovery of 'grid' cells jHafting et al., 2005a| in the rodent dorsal medial 
entorhinal cortex (dMEC) may offer some answers to these questions. 

In contrast to the spatially localized unimodal activity distribution of the place cells found most promi- 
nently in the subfields CAS and CAl of the rodent hippocampus (HC) [O'Keefe and Nadel, 1978] or, for 
example, in humans [Ekstrom et al., 2003| , the activity of these grid cells shows more or less regular, 
multi-peaked activity that forms 'hexagrid' tiling of the space. Interestingly, in different layers within 
the dMEC, while preserving this compact covering structure, the activity is also modulated by velocity 
and directional information [Sargolini et al., 2006| . Due to this regularity, these cells are thought to main- 
tain a metric, and thus provide a basis for self-motion information or 'path-integration' (for a review, see 
[McNaughton et al., 2006] ). Although this construct is very appealing, the finding [Barry et al., 2007] that 
grids may faithfully follow the distortion of the (familiar) environment casts doubt on the straightforward 
link between grids and path-integration, as such distortions may point to a topological description instead of 
a metric one jDabaghian et al., 2007a| . Acknowledging that the functional explanation of these grid struc- 
tures has yet to be found, attention has recently been focused on (1) functional links between grid and place 
cells, and (2) possible mechanisms that would be able to generate such regular structures. As a complete 
review is beyond the scope of this paper, here we only list some of the most recent proposals corresponding 
to these two directions. 

Several models in the first group elaborate on the ideas described in [Sharp, 1991| : competitive learn- 
ing resulting in sparse representations may explain the formation of place cells in the dentate gyrus 
(DG), [Rolls et al., 2006[ and in CAS and CAl [Franzius et al., 2007| . In these models the existence of 
an appropriately defined set of regular grid inputs is the most stringent hypothesis. Another route is 
based on the ideas of [Cash and Yuste, 1999 on Hnearity and the proposals in [O'Keefe and Burgess, 2005 
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|McNaughton et al., 2006] : it has been shown [Solstad et al., 2006| that place cells can easily be formed if 
anatomically and physiologically sound constraints are taken into account. The problem with this model 
is that it requires grids with diverse orientations, but recent reports [Barry et al., 20071 |Fyhn et al., 2007| 
show more uniformly oriented grids. 

Similar ideas provide the basis for models of the second group: Linear summation of harmonic 
functions forms the core idea of different oscillatory interference models jO'Keefe and Burgess, 20051 
jBurgess et al., 2007| . In this dynamic model grid cells receive directionally modulated oscillating dendritic 
inputs superimposed on somatic large scale oscillations occurring at 4-10 Hz (theta-oscillation). With appro- 
priate directional modulation provided by subicular head-direction cells [Ranck, Jr., 19841|Taube et al., 1990] 
this model yields regular interference patterns. To enable path-integration, grid patterns should be precisely 
bound to environmental cues, because error can be accumulated in both motor signals (speed) and direction 
signals. Feedback from CAl has been suggested to provide the necessary correction and thus to maintain 
the coherence of the oscillations by regulating phase resetting. However, as CAl is one step downstream 
of the superficial layers of the entorhinal cortex (EC), it is not obvious why it would receive at the same 
time a more direct sensory stimulus compared to the information available at the entorhinal cortex. A spe- 
cific class of continuous attractor models has also been proposed either with periodic boundary conditions 
[McNaughton et al., 2006] or with aperiodic boundaries, but with highly restrictive symmetric constraints 
on the synaptic connection matrix [Fuhs and Touretzky, 2006) . These models achieve path-integration using 
the grids and can explain many important aspects of the biological system, e.g., the similar orientation of the 
grids, scaHng and phase properties. However, the correct integration of signals to perform path-integration is 
very sensitive only to factors related to the model setup, not to the system at hand |Burak and Fiete, 2006| . 

In this paper we sketch an alternative view of the problem of grid cells. Unlike the models described 
above, which attempt to explain a particular phenomenon or computation assigned to a given area, we 
describe a functional model of the hippocampal region (HR, comprising the entorhinal cortex, the dentate 
gyrus, areas CAS and CAl, para- and presubiculum and the subiculum; see [Witter and Amaral, 2004} 
jMohedano-Moriano et al., 2007[ in which spatial navigation and space representation are addressed within 
the more general context of efficient memory systems. Explanation of the connections among different 
memory functions, such as the formation of episodic memories, memory consolidation and retrieval, has long 
been recognized as one of the major challenges in neuroscience, and several attempts have already been 
made to provide a unifying view [Levy, 19961 [Recce and Harris, 19961 IWallenstein et al., 19981 IGaffan, 19981 
[Redish, 1999[ . Albeit with different emphases, similar motifs emerge in most models. One such motif is 
that the context for separate episodic memory traces corresponds to the environment of the actual position. 
While this metaphor may help to conceptualize the acquisition of new memory traces, it does little to 
further our understanding of retrieval (that is the actual usage) and consolidation [Nadel et al., 2007| of this 
knowledge, as well as the role of the HR in these tasks. Here we show that the information theoretic notion of 
efficient representation may link these diverse functions and lead to a large-scale computational model of the 
hippocampal region in which the intriguing grid-Hke activity pattern may naturally emerge. The proposed 
architecture is partly rooted in the functional comparator model described in [Lorincz and Buzsaki, 20001 
[Lorincz et al., 2002[ and is strongly motivated by new theoretical results on blind source separation problems 
[Poczos and Lorincz, 20Q5]|P6czos and Lorincz, 20061 [Szabo 6t al., 2007 



In the Methods section, theoretical motivations about efficient representation are exposed. Afterwards, 
relevant anatomical and physiological properties of the hippocampal region are highhghted to support the 
resulting mapping. In the Results section (1) we formalize our model according to the motivations described, 
(2) explain the functional correspondence between the theoretical construct and the neural substrate (func- 
tional mapping) and (3) present model verifying simulations that show how our model exhibits characteristic 
spatial behavior similar to that found in different parts of the HR. In the last section we discuss the relevance 
of our findings, interpret our results and make predictions concerning the functioning of the HR. Finally, 
some relevant but unresolved issues are enumerated. 



2. Methods 



We begin with some definitions that we use throughout the paper. Then we highlight the central mo- 
tivations behind our large-scale functional model. The model is not yet extended to low-level cellular and 
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network mechanisms and thus the mapping of the proposed function to the neurobiological structure is 
essentially a logical arrangement of known anatomical and physiological findings. 

Theoretical motivations 

We propose a hypothesis set based on theoretical considerations. Then we enumerate the supporting 
arguments for each hypothesis and explain the essential statistical concepts that form the core of our proposal. 

We use the term 'memory' for internal representations of spatio-temporal patterns of observations that in 
some way helps the system (agent or animal) to analyze, predict and react to changes (used in a very broad 
sense). Here observation incorporates not only the perception of the external world, but also the registering 
of the internal states of the self: motor commands, emotions, goal-oriented behavior and so on. In this 
framework, sensory-motor binding, for example, is about to form an intermediate representation that can 
faithfully represent the complex observations in a compressed form which is then used to define the response 
to those observations. 

Motivated by ideas in machine learning, information theory and goal-oriented reinforcement learning, one 
can make the following hypothesis about an efficient memory system: 

• Prediction: In order to increase the chance of survival under varying conditions, memory creation 
should serve detection of novelty or change. 

• Probabilistic interpretation: Due to the stochastic nature of changes, representations may only 
be interpreted within a probabilistic framework. 

• Information separation and fusion: For tractable probabilistic inference, the effect of the 'curse 
of dimensionality' has to be efficiently diminished through the discovery of the independence of the 
underlying causes of the changes experienced. 

Prediction 

In line with |Rao and Ballard, 1997^ [Friston, 2005| we hypothesize that the goal of the memory system is 
to help maintain, accelerate and fine-tune a predictive coding mechanism (for a review on predictive coding 
in the brain, see [Kveraga et al., 2007| ). The predictive faculty is needed for two reasons: not only does 
the agent/animal have to interact with a changeable environment, but functional delays (reaction time, 
internal functioning, synaptic delays) also have to be compensated. Models of predictive coding usually 
employ loops that allow comparison of bottom-up signals ('input') and expected signals ('output') of the 
internal dynamical model of the observations. It has already been proposed that the HR jSzirtes et al., 2005] 
realizes a Kalman-filter like internal model to predict sensory signals. Interestingly, some recent results 
[Lorincz and Szabo, 2007| on the approximation of independent processes (that is dynamical models that 
assume independent noise as opposed to the Gaussian noise assumption of the Kalman-filter approach) may 
provide a natural combination of efficient prediction and information extraction, thus serving both the first 
and the third hypotheses. 

Probabilistic interpretation 

Alternatively, the expected signals may come from a generative model [Hinton and Ghahramani, 1997| 
which seeks probabilistic sources that could make up or cause the perceived signals: the hidden sources 'ex- 
plain' the observed signals. Such a statistical approach is useful in that the system has to cope with multiple 
uncertainties: noisy signals, hidden causes, faulty internal working, multiple potential interpretations. The 
learned spatio-temporal structure of the hidden sources restricts the representations of the world and, in 
turn, can be used for inference in a Bayesian manner [Kording and Wolpert, 20041 . The computational mo- 
tivation for seeking the hidden causes is to reduce the daunting problem of inference: the detected temporal 
changes are either causally related and can thus be predicted or are intrinsically independent. If the causes 
are statistically independent then their joint probability distribution may be factored. 

The probabiHstic framework has an added advantage compared to a deterministic encoding mechanism: 
the belief of the system in its own judgment (e.g. about the existence of a particular source) may also be ex- 
plicitly encoded or maintained to support further inference |Yu and Dayan, 2003| . Reconstruction networks 
[Grossberg, 1980HUllman, 1995| try to integrate the 'best of both worlds': by maintaining an internal model 
of the external world, fast manipulations of the sensory-motor integration (modulation, planning, and so on) 
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can be achieved. On the other hand, by extracting useful statistics of the incoming signals, robustness against 
noise and novelty detection may also be realized. To the best of our knowledge, the first reconstruction net- 
work model for brain modeling that suggested approximate pseudo-inverse computation for information pro- 
cessing between neocortical areas was published by Kawato et al., [Kawato et al., 19931 . The computational 
model of the neocortex was extended by Rao and Ballard |Rao and Ballard, 1997 Rao and Ballard, 1999| , 
who considered neocortical sensory processing as a hierarchy of Kalman-filters. 

The reconstruction idea has also appeared in hippocampal models [Lorincz, 1998| . An exten- 



sion of that model [Lorincz and Buzsaki, 2000| suggested the integration of the early comparator idea 
ISokolov, 1963t [V inogradova, 1975) . In these models, the whole EC-HC circuitry forms a 'novelty'-detecting 
network, in which novelty or reconstruction error is the difference between the expected (top-down) 
and experienced (bottom-up) neuronal representations. The proposed model successfully predicted in- 
dependence in the cellular activity in CAl [Redish et al., 2001] and w as the first to suggest distinct 



roles for the direct and tri-synaptic pathways [Kloosterman et al., 2004 . A reconstruction-network like 



mechanism |Hasselmo et al., 2002| connecting CAS and CAl has been suggested that directs the information 
flow during encoding and retrieval. In another study [Becker, 2005[ , each hippocampal layer forms a separate 
representation that could be transformed linearly to reconstruct the original activation patterns in the EC. 

These lines of arguments lead to the first assumption about functional mapping: the HR may be considered 
as a reconstruction-network with predictive capacity. 



Information separation and information fusion 

For any probabilistic reasoning, we have to define the elementary events that make up all possible out- 
comes. Without knowing their true probability distribution, we need to sample them (by experiencing differ- 
ent outcomes) and approximate the unknown distribution. This task becomes computationally intractable 
with the increasing number of possible events. Furthermore, discretization of the space-time continuum, e.g., 
sampling is another source of noise and computational explosion. 

Consider, for example, the problem of sequence learning [Fusi et al., 2007[ . If we want to take into account 
all pieces of sensory information at each moment during which the system is able to take a sample, we can 
only store sequences of limited temporal duration. Furthermore, the number of patterns that make up the 
sequence is not known beforehand. In turn, the system should be able to fiexibly compress the spatiotemporal 
patterns into an internal form which (1) is subject to memory capacity constraints, but (2) still preserves all 
relevant information concerning the ongoing events. To do so, information should be collected, represented, 
and possibly compressed over time, because (spatial) changes take place at different temporal scales compared 
to the internal clock. Motion induced visual changes, for example, imply that part of the information is lost 
unless it is remembered in some economical forms. 

Temporal compression can be achieved by implementing a predictive system which can recover (explain 
in simpler terms) the deterministic parts of a stochastic process. If the predictable part is extracted, the 
rest of the available information (the so called 'innovation') has reduced temporal correlation. 

On the other hand, if the independence of the underlying causes may be assumed (as noted above), 
information transfer can be optimized by forcing independence among the components of the emerging 
representation putten and Herault, 1991[ Comon, 1994[ [Cichocki et al., 1994[ [Laheld and Qardoso, 1994} 



Bell and Sejnowski, 1995 Amari et al., 1996] 



Importantly, the very same assumption may greatly simplify the predictive modeling as well. This is in 
line with Barlow's revised formulation of the redundancy reduction principle [Barlow, 2001[ : representations 
should not be rigid structures but rather tools that serve the animal's current (that is variable) goals. They 
should therefore appropriately map the changing statistics of the world they represent. 

Elaborating on his idea about obvious (simple) and 'hidden' forms of redundancy (see [Barlow, 2001| and 
the references therein), the second main functional conjecture in our model is that HR maximizes information 
transfer throughout the neural circuitry (by reducing the obvious redundancy) and at the same time reveals 
the hidden structures by separating them into independent subspaces. That is, the learning system reveals 
the types of approximately independent sources and their own intrinsic dimensionality. 

To highlight this issue, consider the problem of space representation formed by the HR. The main 
input through MEC to the HR is primarily multimodal sensory information with implicit and limited 



spatial information content |Fyhn et al., 2004] , such as direction or configuration. Configuration of ob- 
jects can be interpreted as one of the independent descriptors of the environment. However, its true or 
approximate dimension can only be revealed if the system is able to detach the corresponding correla- 
tions among the components of the representation from those that carry information about other phys- 
ical aspects, such as texture or color. Such separation may reveal that configuration may best be de- 
scribed in a 2 or 3 dimensional space that actually corresponds to our abstract notion of Euclidean space 
[Dabaghian et al., 2007al [Dabaghian et al., 2007b1 . 

Dimensionality, in general, may not be well defined for the other physical aspects, such as texture or color, 
see, e.g., |Ben-Shahar and Zucker, 2004| . Note that these descriptors, or 'factors' assume each other, but 
they are also highly independent. This dichotomy can be exploited in the following way. On the one hand, 
there is combinatorial gain in the description of events if characterization, categorization and prediction 
of the factors takes place separately. For instance, a screenshot of an animal is a static image containing 
no direct information concerning motion. Still, the particular combination of factors or components of the 
animal may help to draw inferences concerning the unseen parts of the animal and the (intended) direction 
of motion. Pattern completion can be seen as a particular inference problem that occurs in space and time. 

Interestingly, as new results jPoczos and Lorincz, 2005] on blind source separation show, factorial coding 
and subspace separation can be achieved simultaneously. In blind source separation problems, not only 
the sources, but also the mixing process that generates the received signals are unknown. In general, this 
problem cannot be solved without regularization. Assuming independence in time seems plausible in many 
problems. For a special case of instantaneous linear mixtures of (statistically) independent and identically 
distributed (i.i.d), one dimensional sources, where the dimension of the signal is larger than or equal to 
the dimension of the sources, there exist efficient, neurally plausible Independent Component Algorithm 
(IC A) algorithms [Giannakopoulos et al., 1998[ [Linsker, 1999| that can recover the true non-Gaussian lower 
dimensional sources by demixing the signal. 

ICA can be significantly faster jAmari et al., 1996] if separation is preceded by whitening. This inter- 
mediate transformation reduces the instantaneous (zero time lag or spatial) second-order correlations (i.e., 
it decorrelates) and it also normalizes the signals. Informally, decorrelation transforms the data onto an 
orthogonal subspace such that the projection of the data onto the first (principal) direction of the subspace 
has the greatest variance, projection on the second principal direction has the second greatest variance and 
so on. The decorrelation part is also called Principal Component Analysis (PCA) and may be used for di- 
mension reduction in an informed way as it provides a measure of how much information (at the second-order 
correlation level) is lost by ignoring the last k directions or components. Whitening admits that all sources 
may equally be important, so after the decorrelation step it equalizes the variances of the components. The 
terms 'whitening' and 'decorrelation' may be used interchangeably, but they scale the results differently. 

For more general cases of ICA, there is no trivial solution, but as both experience and 
IQardoso, 1998^ |P6czos and Lorincz, 2005| several theoretical advances have indicated jSzabo et al., 20071 
[Poczos et al., 2007| , sources can in many cases be recovered even if conditions (independence, i.i.d. proper- 
ties or equal dimensions) are not met. The recovered components can be grouped by their mutual information 
— that is using the 'non-independence' information — thus revealing the number of separable sources and 
the dimensions of their subspaces. This procedure factorizes the information and gives rise to combinatorial 
gains in the storage requirements. In addition, recent theoretical findings allege that the search for these 
factors can be accelerated in a non-combinatorial way [Poczos and Lorincz, 2006^ [Lorincz and Szabo, 2007| 
[Szabo et al., 2008] even if the dimensions of the subspaces are not known beforehand [Poczos et al., 2007] . 



3. Known anatomical and physiological constraints 

In this section, we describe those characteristics of the HR that guide and constrain our model. The 
circuitry of HR (left panel of Fig. [T|) has several unique properties that probably contribute to its central 
role in all memory functions. Here we highlight features that seem relevant for mapping the functions onto 
the neural substrate. 
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Figure 1: (a): Diagram of the main connections of HR. Arrows denote excitatory connections and solid 

circles denote mostly inhibitory connections, (b): Connections playing a role in the model. Roman letters 

denote the layers in the entorhinal cortex (EC), 

x: signal from cortex, 

y: whitened input at EC III, 

n^: whitened novelty (or innovation) of the input at EC II, 

h: hidden model at EC deep layers, 

nh: innovation of the hidden model at EC deep layers, 

s: ICA output at CAl during positive theta phase, 

e: ICA output at CAl during negative theta phase, 

Rii and i?///: postrhinal to EC II and postrhinal to EC III efferents, respectively, 

Qii and Qm: EC deep layers to EC II and EC III connections, respectively, 

K inhibitory feedback from EC III to EC II. 

V: CAl to EC deep layer efferents, 

Alh'- recurrent collaterals at the deep layers of the EC, 

Wtri and Wdir- tri-synaptic and direct connections between EC superficial layers and the CAl subfield, 
respectively. 



Direction of information flow 

First, there is a dominantly unidirectional ( |Naber et al., 200T] , but see |Shao and Dudek, 2005| ), and 
parallel connection system among all parts: superficial layers of EC receive input from adjacent cortical 
regions and transmit the signals toward CAl and the subiculum mediated by CA3. This transmission, 
however, is not a simple relay: it takes place in a tightly controlled way using two separate routes: the 
so-called tri-synaptic connection system (EC II-DG-CA3-CA1) and the direct route from EC III to CAl. 
As the exact nature of the input received by EC II and EC III is not known and we want to focus on the 
functioning within the HR, we assume that the superficial layers share the same cortical input. We also 
assume that differences in the activity of these layers stem from their differing intrinsic physiology (e.g. 
the ratio of interneurons that enables strong feedforward inhibition in EC II), anatomy (role of recurrent 
collaterals) and the received feedback (EC layers V/VI project back to both layers and EC III receives signals 
from the subiculum, too). 

CAl and the subiculum, which are considered to be the main output regions of the HR, project back 
to the deep layers of EC. In parallel with the subicular pathway, CAl is linked to the deep layers directly 
as well. The parallel systems in part preserve topographical arrangement [Witter, 2006| but there exists a 
separation along the lateral to medial direction. The lateral and medial parts of the entorhinal cortex (LEC 
and MEC, respectively) receive input from different cortical areas and, in turn, project to non-overlapping 
portions of CAl and the subiculum. In contrast, DG and CAS receive convergent input from both LEC and 

6 



MEC. An important functional consequence is that the fusion of spatial and non-spatial information may be 
strictly controlled within HR |Gigg, 2006i [Witter and Moser, 2006) . 

The EC deep layers, which presumably also receive modulatory or control signals from different cortical 
areas, close the loop: they send mostly excitatory |van Haeften et al., 2003| feedback to the superficial layers. 



Unique intra-regional interactions in each area 

Although place cells can be found everywhere in DG, CAS and CAl, their coding mechanism may be 
quite different, as the underlying connection systems have significantly distinct features. DG is unique 
for its temporally tunable conn ections [Henze et al., 2002|. CAS has a dense colla teral system which has a 
particular role in memory replay |Louie and Wilson, 2001 Foster and Wilson, 2006t[Diba and Buzsaki, 2007[ 
[Csicsvari et al., 2007HO'Neill et al., 2008| . CAl, as a single exception in the whole circuitry, has no recurrent 
collaterals and the activity of the principal cells seems to be independent [Redish et al., 200"T] . 



Temporal synchrony across and within different areas 

In addition to the intricate anatomy, the physiology of the separate modules is also striking. The 
most prominent feature is the interplay between different forms of oscillatory activities, the synchronized 
membrane potential oscillation between the 4-10 Hz theta and the 40-100 Hz gamma frequency bands, 
[Bragin et al., 19951 [Canolty et al., 2006] , which have differential effects on the different modules. Sev- 
eral functional roles have already been assigned to these activity forms , such as the control of syn- 
chrony throughout the circuitry [Denham and Borisyuk, 2000| or the provision of an internal reference clock 
[Jefferys et al., 1996H Jensen et al., 1996| . 

The main generator of theta is thought to be in the septum (which is the only extra-hippocampal target 
of CAS), but layer EC H may also be able to initiate theta activity. The reciprocity between the subiculum 
and the HR via CAS may suggest that HR has a sophisticated mechanism for self-regulating synchrony. In 
addition, EC II neurons are theta modulated and show phase precession, similarly to the place cells in the 
hippocampus [Hafting et al., 2005b| . 

EC III, which is very close to layer EC II, however, is phase locked to the main theta and can maintain 
persistent activity jTahvildari et al., 2007| . Deep layers of the EC show pecuHar functioning as well. In 
contrast to the superficial layers, EC V can generate input specific graded persistent activity in individual 
neurons [Egorov et al., 2002] which is generally considered the underlying neural mechanism of working 
memory [Goldman- Rakic, 1995| . Furthermore, the relative homogeneity of the CAl response to changing 
inputs as compared to that seen in the deep EC may suggest [Frank et al., 2006] that active CAl neurons 
are engaged in representing one environment, while deep EC may contain multiple subpopulations, some 
tied to CAl output while others are more independent of CAl. Interestingly, separate modules or 'cell 
islands' can be found in EC II as well [Witter and Moser, 2006[ . As a consequence, if deep layers can 
represent several likely models concerning the world, there should be a switching mechanism that can help 
select the one that best serves correct predictive coding. It is intriguing that layer III of the EC has been 
found to receive such switching signals [Tahvildari et al., 2007 . Last, but not least, signals carrying different 



aspects of spatial information, such as position, head-direction or speed, seem to interfere at several stages. 
While activity in CAS and CAl doesn't show correlation with directional information, postsubicular head- 
direction cells directly innervate the deep layers of EC, which in turn send this information to the superficial 
layers. According to this scenario, grid cells in EC III show clear conjunctive correlation representing mixed 
information at the same time [Sargolini et al., 2006[ . However, the activity of neurons in EC layer II is free 
of directional modulation. 



4. Results 

In the first part we formahze the proposed functions by providing a mathematical construction. In the 
resulting computational model the different functional modules are not yet anchored to the real system. Since 
this reverse-engineering approach (assignment of the functions precedes the description of the structure) is 
essentially ill-posed (offering several solutions), in the second part we attempt to map the modules onto the 
real neural system by taking into account the biological constraints collected in the previous section. Finally, 
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simulations are presented in which the function of the model is demonstrated on inputs that can be related 
to signals received by the hippocampal region. 



Results I: Formal description of the functional model 

Let us assume the system's goal is to form efficient representation of the sensory information which can 
be used for prediction. Efficiency refers to storage capacity (a small number of 'factors' should be used to 
reconstruct large number of possible inputs) and speed (the system should try out only a few combinations of 
the factors). Prediction is the ability to generate expected inputs. Let us begin with an abstract description 
of the observation of the external world (At this point we don't model different sensory modalities. The 
input variable is simply a description of the external world). The sensory input x(t) to the system may be 
assumed to be a mixture of hidden source signals or causes: 

(4.1) x(t) = As{t), 

where A G E"^" is a mixing matrix, and s{t) G M" are the sources to extract. Regarding our hypothesis 
(3), ICA is designed to solve a similar problem under the condition that the components of s are i.i.d., and 
statistically independent. However, the observed quantities may not be i.i.d., 

(4.2) s{t+l) ^ Fs{t) +e{t + l), 

where e{t) is called the 'driving noise', 'true source', or 'innovation'. The expression 'driving noise' refers 
to the fact that process s is maintained by the 'true source' e: without this input, s(t) would decay. Due 
to the mixing effect of matrix F which describes the deterministic part of the process, the components 
in s{t) are not independent anymore. Obviously one can envision more sophisticated systems. Neverthe- 
less, for higher order processes or signals with echoes, the formalism can be brought to very similar forms 
[Lorincz and Buzsaki, 2000} |Szab6 et al., 20071 |P6czos et al., 2007] . As long as the components of the true 
source, e{t) can be considered independent, the efficient representation can again be achieved by extracting 
these components. If the dynamics are 'weak' in the sense that only weak temporal correlations are intro- 
duced by F, then we arrive at the original ICA problem. Because we are interested in the causes, i.e., in the 
driving noise, we need to learn both the autoregressive process (F) and the mixing process {A). This can 
be achieved [Lorincz and Szabo, 2007| only if components of the true driving noises are independent. Under 
the normal (Gaussian) noise assumption the effects of these processes cannot be distinguished. We need to 
carry out some manipulations in order not to misguide ICA. 
We make use of the identities 

(4.3) x{t + 1)^ As{t + 1) = AFs{t) + Ae{t + 1)) 
to get 

(4.4) x{t + l) = Mx{t) + n{t + l), 

where n{t + 1) = Ae{t + 1) and M = AFA~^ under the assumption that matrix A can be inverted. Thus, 
both Eq. I|4.2p and Eq. (|4.4p have autoregressive forms. Due to the mixing effect of A (Central Limit 
Theorem), the distribution of Ae(t + 1) is more Gaussian-like compared to the true sources. It implies that 
the standard solution of the Gaussian autoregressive processes can be applied as the first step to unfold the 
hidden processes. 

Now let us suppose we have a tunable system and our task is to find the hidden process s and the 
driving source e using only the observation x(t). In what follows, we distinguish approximations of the true 
quantities by a small hat. 

First, one can remove the autoregressive part by estimating matrix M through the minimization of the 
following cost function 

(4.5) J(M) ^IY. + 1) - 

^ t 

for all available data pairs {x{t + l),x(t)). Then, we have a model that predicts the next expected input 



(4.6) 



x{t+l) = il/(t)x(t) 



and we can estimate the innovation, i.e., the difference between the observed input and the expected input 
at time t: 

(4.7) nx(i)-x(t)-x(t). 

For Gaussian nx(0) the minimization of Eq. I|4.5p leads to the following gradient rule: 

(4.8) AM{t + 1) =at {x{t + 1) - Af (t) x(i))x(t)' = at n^it + l)x(i)' 

where prime ' denotes the transposed form for vectors and also for matrices, and at is the learn- 
ing rate. If a* diminishes according to some suitable schedule then M{t) converges to the real M 
[Robbins and Monro, 195l| . In what follows, the learning rules will be written as 

(4.9) AM{t + 1) oc nx(t + l)x(t)' 

where the sign 'oc' denotes the Robbins-Monro schedule. Note, however, that if the world is changing then 
it is better to maintain adaptation forever. 

So far we have exploited the Gaussianity property of the driving noise to learn the dynamical system. 
Now we can make use of the fact that upon convergence, the innovation term also converges to the mixed 
true sources of Eq. I|4.ip (nx(t) i-^ Ae{t)). In turn, simple separation of the innovation yields the demixing 
process W, which is the approximation of the inverse of the mixing matrix: W = . Then e{t) = Wn^it) 
is the approximation of true sources, whereas s(<) ~ Wx.{t) approximates the hidden process. 

One can approximate the autoregressive matrix F using quantities x, M, and e. The goal of the approx- 
imation is to optimize prediction, that is, to minimize the following cost function: 

(4.10) J{F)^^Y.\^{t + l)-F{t)m'- 

t 

As with matrix M, matrix F can be learned through the following gradient rule: 



(4.11) AF{t + 1) oc {s{t + 1) - F{t) s{t))s{ty = e{t + l)s(t)', 
that is, 

(4.12) AF(t+l)oc Wnx(* + l)(l^x(t))' 

This strategy has been detailed in jLorincz and Szabo, 2007| . 

Let us note that the gradient learning rules of Eqs. (|4.8p and (|4.1ip may have plausible neural implemen- 
tations as they are incremental and the change in one synapse does not depend on the change in all the other 
synapses. If this latter condition is met, then we say that learning is Hebbian, or alternatively, the learning 
rule is 'local'. 

As signals should be separated and — as was argued before — separation can be facilitated if whitening 
takes place first, a decorrelation stage might be introduced. According to IQardoso and Laheld, 1996] , signals 
y = PyX become decorrelated if 

(4.13) APy{t + ly K PyitYil - y{t)y{ty) 

for all times t = 1,2,... and under suitable conditions. Note that here, in Eq. (|4.13p . and in similar 
equations later, the learning rule contains the transposed form of matrix Py and thus dimension reduction 
(dim(y) < dim(x)) is possible. Intuitively this serial update algorithm pushes the covariance matrix of y{t) 
{E{yy'), where E{.) denotes expectation) to become identity. Let us remark that there are many artifi- 
cial neuronal implementations of such algorithms jFoldiak, 1990| [Hyvarinen and Oja., 1998| [Linsker, 1999| 
[Basalyga and Rattray, 2003| . 

For the very same reason, innovation nx(i) should also be decorrelated. The linear transformation = 
-fri2nx(t) of innovation nx(t) becomes white if tuning of Pn^ is as follows: 



(4.14) 
at time t. 



AP^Jt + ly cx P;,Jt){I ~ n,{t)n,{ty) 
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Statistically independent sources from can be extracted via a nonlinear modification 

[Qardoso and Laheld, 1996) of update rule Eq. (|4.14p . There are many variants for this non- linear learn- 
ing rule and we provide the simplest of these here: 

(4.15) AWnM + 1)' « WnMYil - e(i) f{e{t)y). 

Here, /(•) is an (almost) arbitrary component-wise nonlinear function. Upon convergence, the components 
of e{t) = Wn^n^it) approximate the components of the independent source e{t) apart from an arbitrary 
permutation in the order of the components, their scale and sign. Interestingly, spike timing dependent 
plasticity has been suggested to realize this non-linear learning rule |Bell and Parra, 2005| . 

The learning equations of the whitening and separation processes have several implications concerning 
possible mappings. 

Two stages: Removal of the temporal correlations precedes the extraction of the independent factors. 

Two channels: According to Eq. I|4.1ip . the process of learning the predictive system requires concur- 
rent access to the input and the innovation. These variables may be stored separately and conveyed 
to the predictive layer via separate channels. 

Identical separation: It can be seen from Eq. I|4.12p that both e(t) and s{t) are demixed by the same 
matrix, so they should be processed in the same demixing channel (violating the conjecture above) 
or there should be a mechanism that can compensate for the differences (e.g., sign and permutation 
of the components) in the linear transformations in two channels for proper demixing. 



Results II: Functional mapping of the model 

Since both the computational considerations and the anatomical findings are quite complex, we need to 
introduce some simplifications: 

Rate coding: How information is actually transmitted by the neurons is neglected. The key issue is 
that once the particular form is given, the function of the system can be analyzed as an information 
processing system. (On the controversies concerning the potential forms of information processing, 
however, see e.g. [Reyes, 2003 , Masuda and Aihara, 2007| .) Our system description becomes simpler 



if we use analog values, which corresponds to the concept of rate coding as opposed to spike based 
temporal coding. The supposed low-pass filtering effect of the theta oscillation also suggests that for 
some functions fine scale temporal precision might be neglected. 

Laminar homogeneity: We neglect the complexity and richness at the cellular level and consider 
neurons as computational units. The computations may change from layer to layer, but within a 
layer the nature of the computation is the same for all neurons. This corresponds to the terminology 
of standard artificial neural networks. 

Apparent linearity: Although strong nonlinearities are present everywhere, from the subcellular level 
to the network level, there are nonetheless many cases in which the overall response of the system is 
approximately linear, see, e.g., [Linsker, 1999| , [Hsu et al., 2004| and [Escabi et al., 2005] and the 
cited references. The complex contrast normalization mechanisms in visual sensory processing may 
constitute a specific example |Finn et al., 2007| . 

>From now on, matrices denote synaptic weights (connection strength) between layers and vector denotes 
the activity at a given layer. We shall slightly abuse notation and will discard the hats from our equations, 
as all learned quantities are approximations. 

Figure [T] may help to understand the modular structure of our model and its relation to the hippocampal 
region. While the left panel depicts the gross anatomy of the areas, including the different connection 
systems, the right panel of Fig. [T] shows the simplified architecture and the functional correspondences. 

The following areas of the hippocampal regions are considered in the functional mapping: deep layers of 
the medial entorhinal cortex (denoted by EC V/VI), superficial layers (EC II and EC III) and subfield CAl 
of the hippocampus. The tri-synaptic path (denoted as Wtri on Fig. [1]) involving the Dentate Gyrus (DG) 
and CAS will be collapsed into an integrated transformation. The potential role of the DG, CAS as well 
as the Subiculum (SUB) will be discussed in the last section. For simplicity, all areas and subfields will be 
referred to as 'layers'. 
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As all computations described above require statistical characterization of input ensembles, sampling and 
processing of the sensory input and incremental tuning (learning) are also necessary. Input processing and 
learning, i.e., fine tuning of the synaptic weights that actually filter the information, are discussed separately. 

Characterization of the input to the hippocampal region 

Let x(i) g M" denote the analog valued postrhinal input to the entorhinal cortex at discrete time t where 
n is the dimension of the input. In this model, we limit ourselves to square problems, which is to say that 
n may be considered as both the number of postrhinal neurons and the number of entorhinal neurons of the 
targeted layer. Let us also assume that the input follows the dynamics described above. The postrhinal input 
enters the circuitry at the superficial layers of EC through two parallel connection systems Rn G E"^" and 
Riii G R"^", so, we assume that the number of principal cells in each superficial layer is equal and is also 
n. These connection systems may only transmit cortical input to HR, so their tuning is omitted: admitting 
the lack of knowledge concerning the exact nature of the parallel postrhinal inputs, we may suppose that 
Rii = Riii = I, where / S M"^" denotes the n x n identity matrix. When the process of learning the 
matrices is considered, a temporal index is shown in most cases. For better readability the time index is 
dropped for non-tunable matrices and in the dynamical equations. 

>From EC II/III the signals are sent to the hippocampus through the direct, i.e., EC III^ CAl, and the 
indirect, tri-synaptic i.e., EC II— > CAl pathways (denoted by subscripts 'dir' and 'tri' on the right hand side 
of Fig. [11 respectively). 

Detailed correspondence between the functional model and the neural layers of the HR 

The formal description has some direct consequences concerning the potential roles of the different layers 
of the HR. First, it is obvious that innovation (that is the comparison of the predicted and actual inputs) 
can only be stored in a layer that not only receives the input, but is also the target of inhibitory feedback. 
Due to its widespread inhibitory network, EC II is assigned to hold the innovation. The activity at EC II is 
as follows: 

(4.16) n,(i + 1) = i?//x(< + 1) + QiMt) - Ky{t), 

where y{t) and h(i) denote the activity at EC III and EC V/VI, respectively. (Roman subscripts of the 
connection matrices denote the number of targeted layers.) Connections from EC HI to EC II, denoted by 
K, are assumed to be mostly inhibitory. The reason for this assumption is that the vast majority of the deep 
to superficial connections are excitatory and mostly target principal cells in EC II |van Haeften et al., 2003 
and the cortical inputs are also of excitatory nature. In turn, K is the candidate connection system that 
effectively targets the inhibitory network of EC II. Here, the role of Qn is to whiten the innovation, whereas 
the role of K is to ensure that the emerging activity pattern is indeed proportional to the required innovation. 

Equation l|4.16p and the connectivity of the HR implies that y{t) should be proportional to the input 
and is made of two terms from bottom-up and top-down contributions. The activity of EC HI is thus the 
following: 

(4.17) y{t) = i?///x(t) + Q///h(t), 

where Qui — in accordance with the redundancy reduction principle — is assumed to decorrelate the 
activity at the targeted layer, EC HI. However, decorrelation of quantity y{i) may infiuence (distort) the 
innovation in EC II. This raises some doubts, because quantity nz(i+l) might be contaminated by predictable 
components, or its whiteness might be spoiled. In turn, tuning of matrix K should somehow counteract both 
problems under the constraint that learning is Hebbian. The solution to this threefold problem is an emerging 
property in our model. 

As was noted earlier, CAl has a central location since it is targeted by both layers EC HI and EC II via 

Wdir{t) e i?"^" and Wt„{t) G i?"^", respectively: 

S(t) = Wd^r{t)y{t), 

where s(t) G i?" denotes the activity of CAl, if its driving input is projected from EC HI and 

(4.18) eit) = Wt„{t)ny{t), 
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where e{t) G R" denotes the activity of CAl, if its driving input is projected from EC II. Following 



the proposal of [Lorincz, 1998 Lorincz and Buzsaki, 2000] and supported by the experimental findings of 
[Radish et al., 2001| , independent components should be expressed in CAl. In turn, we believe transforma- 
tions Wdir{t) and Wtri{t) realize the actual signal separation and provide approximate independent compo- 
nents. We note that according to Eq. I|4.2p e{t) should be equal to the innovation of s{t). However, unlike 
in the superficial layers, there are no recurrent collaterals in CAl. This means that for properly tuned Qii, 
Qui and K, the two bottom-up transformations, i.e., Wtri and Wdir should become effectively identical in 
the absence of recurrent collaterals. 

CAl signals may leave the loop through the subiculum or they may be sent back to the deep layers of EC 
via the connection system denoted by V G i?"^". (On the intriguing properties of V (not modeled here), 
see [Naber et al., 2001| ). 

In line with [Lorincz et al., 2002| , a central function of the deep layers of EC may be pattern completion. 
However, as was already noted, forcing independence does not support pattern completion. It is also known 
?that activity patterns of the deep layers of EC are not in fact independent [Sargolini et al., 2006[ . This 
implies that 'remixing' of the components is advantageous. Of the many possibilities, whitening seems the 
most straightforward transformation, as it does not increase the number of transformations within the EC- 
HC circuitry. The resulting patterns may show higher-order correlations supporting the task of pattern 
completion. Since the internal predictive system is based on the intensive use of recurrent connections, only 
CAS and EC V/VI may be considered. If our assumption about the roles of the superficial layers are valid, 
then EC V/VI should realize the predictive system since CAS is not supposed to receive significant input 
from EC III. 

Consequently, the activity at EC V/VI can be written as: 

(4.19) h(< + l) = M,,h(t)+nh(t), 

where predictive system Mh can propagate activity h(i) in time, h(t) = Vs{t) and nh(<) = Ve(t). In 
addition to conveying information from CAl, V is responsible for the decorrelation of the activity patterns. 
Afh is an approximation of the dynamical model underlying the observations (see Eq. (j4.2p ). The queuing 
of the arrival of the two different inputs (s(i) and e(i)) requires a mechanism that can maintain activity 
long enough to enable integration. Experimental findings on gradually modifiable persistent activity in EC 
V [Egorov et al., 20^2] may support this proposal. 

At last, the deep layers project back to EC II and EC III via Qu and Qm, respectively. 

Learning processes 

For different reasons, S connection systems are assumed to decorrelate the activity of their targeted layer: 
Qii, Qui, and V. Their tuning follows the form given in Eqs. I|4.13p or l|4.14p . For example, learning of 
Qii can be given as: 

(4.20) AQiiit + ly oc Q'iiit){I - n,(i)n,(t)') 

where ny(t) is the emerging activity of the targeted layer, EC II. 

To arrive at the right form of innovation, connections between EC III and EC II need to be tuned. The 
learning rule of K{t) is supposed to satisfy a Hebbian form, similar to Eq. (|4.9p 

(4.21) AK{t + 1) oc n^{t + l)y(0'. 
This is the perfect learning rule, because it minimizes cost function 
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(4.22) J{K) = i ^ \RiMt + 1) + QiMt) - Ky{t)\^ = i ^ |n,(t + 1) 

t t 

which is the Euclidean norm of n^. In this expression each term is a linear transform of x with different 
time lags. The result of the learning rule is that, apart from an arbitrary linear transformation, 

Ky{t) = Qiih{t) + Rii^{t), 

is satisfied in all instances. This is the net result, i.e., nz(t) is indeed a linear transform of innovation nx(i). 
Quantity nz(t) will be white given the learning rule for Qu detailed in Eq. 14.201 In sum, learning rule 



Eg. 14.211 is Hebbian and adjusts the inhibitory contribution until nz(t) becomes a Hnear transformation of 
the innovation, subject to the constraint, that both y(i) and nz(<) are white. 

Separation takes place in both the direct and the indirect pathways, so Wdir and Wtri should undergo 
tuning similar to Eq. (|4.15p . At this point some remarks are in order. We expect to have two separate 
channels, one for the input and one for the innovation, which can basically reverse the mixing effect of the 
very same mixing process (see Eq. (|4.ip ). We have also seen that both separation processes would probably 
end up creating approximately independent components in the same layer (CAl). First, it is necessary to 
ensure that learning in the two separation pathways converges to approximately the same solution. Second, 
it is necessary to schedule the activity at CAl to avoid interference between the patterns corresponding 
to the independent components of the input or the innovation. Regarding the interaction between Wdir 
and Wtri, it is intriguing that while the original problem of ICA (that is when the mixing process and 
the components are unknown) is truly unsupervised, by constraining the outputs of the tunable matrix to 
some prescribed outputs the learning algorithm becomes supervised. Thus, the two matrices may become 
identical if one channel dominates (supervises) the other. Physiological considerations seem to suggest a 
possible mechanism. 

Regarding separation, in the beginning the faster direct pathway may supervise the indirect one by 
providing approximately independent components in CAl. We note that there is a temporal coordina- 
tion between the firing of the neurons that send information through the direct and the indirect paths 
[Dragoi and Buzsaki, 2006| . It is also possible that supervising signals may reach CAl at one phase of the 
theta oscillations, while the signals from the tri-synaptic pathway may reach CAl at the other phase. An- 
other argument is that although place fields in CAl begin to stabilize early (compared to the place fields in 
CAS) and even without input from the tri-synaptic route, full stabilization takes much longer. We suggest 
that the two routes work together. The early stabilization results in approximate independent components 
if the signal from EC III is contaminated by large temporal correlations. The task of the indirect route may 
be to diminish this kind of temporal dependence and to proceed with the separation of the sources, but this 
is a slower process. 

Following our hypothesis, tuning of Wdir{t) and Wtriit) may assume two different forms during the course 
of learning: 



where /(•) is an (almost) arbitrary component-wise nonlinear function. 

In the formal model we have seen that all these transformations are required to provide the right informa- 
tion for the internal predictive model. However, this model also needs tuning in order to match the observed 
signals. 

The approximation of predictive matrix Mh - as with all predictive matrices in the model - can be written 
as follows: 



This rule trains matrix Mh to optimize prediction in with? Euclidean norm norms?. Due to the scheduled 
arrival, we need to suppose that the time window is broad enough to enable interaction of the transformed 
input signal and the innovation. As we see, training is Hebbian, but a detailed mechanism that would 
actually be able to carry on this tuning is missing. Nevertheless, we conjecture that the double loops of 
the direct and indirect pathways have a fundamental role in tunneling the right information at the right 
time. It is worth noting that this assumption is also supported by the experimental finding that activity 
in CAS under one theta oscillation (50-80 ms) may correspond to 1 second of the external sensory fiow. 
Unfortunately, available experimental data is not sufficient to better model this interplay. 

In summary, if all transformations are optimally tuned, then (1), temporal correlations F are learnt and 
represented in the internal model through matrix Mh, (2), the hidden processes h can be estimated by the 
learnt model and (3), the true independent causes e can also be revealed. Note that two main goals are 
achieved; the independent causes (e) are revealed up to an arbitrary permutation, scale and sign, and the 
predictive matrix F is learnt up to a linear transformation. 



(4.23) 
(4.24) 



AWd,r/Ut+lY « Wd^r/Utni~sit)fisit)Y) 

AWd.r/Ut + 1)' « Wd^r/UtYil - e(t) /(e(t))'). 



(4.25) 



AMh(t + l)cx nh(t+l)h(t)' 
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Figure 2: (a): Circular maze, diameter: 2m, with a short sample trajectory. Step size varies between 11 and 
55 cm. (b): Sample input to the loop in the form of an activity map within the maze (see, Eq. I|4.27p for 
details). Activity map is shown in arbitrary units. 



In the next section we turn back to the original problem of the emergence of particular spatial activity at 
different parts of the HR. In the simulations the transformations assigned to different parts of the loop are 
implemented and appHed on structured high dimensional inputs containing spatial information. The goal is 
to study whether the emergent activity at the different modules corresponding to e.g. CAl and ECII/ECIII 
resembles that found experimentally. 



We present a series of simulations with inputs of increasing complexity. The more reaHstic the inputs, 
the more complex the computations that are required to extract spatial information. In doing so, the role 
of different modules can be highlighted. 

In our sample simulations a virtual rat has explored a 2 m wide, open-field circular maze. Similar results 
were reached using a square maze. The path has been generated as follows: the rat runs on a linear path at 
a constant speed and makes a small random turn at each step with a given chance. It also makes a random 
turn if it 'senses' that it may collide with the wall. Input sampHng has been fixed to 55 cm. The length of 
this random trajectory and input sampling were chosen to get a fair coverage of the full area of the maze with 
a reasonable number of samples. The maze and a sample trajectory is shown in Fig.[2j Inputs corresponding 
to turns may only be interpreted by higher order autoregressive processes for which the order would be about 
the average number of steps in a single direction. As the implemented internal model assumes first order 
processes (see the comment at Eq. I|4.2p ). such inputs have been excluded. We shall come back to this point 
in Section m 

The most restrictive approximation in our simulations is that the input contains information about the 
local cues only, no distal information is included. One might think that the input is a mixture of smells that 
differs from point to point. This local nature impHes that parametric maze distortions can not be modeled in 
this framework. On the other hand, this simplification excluded any artifact that would result from arbitrary 
modeling of low-level sensory processing. Instead, we simply mimicked postrhinal ('parahippocampal' in 
primates) [Burwell and Hafeman, 2003| inputs. In contrast to perirhinal input [Eacott and Gaffan, 2005| , 
postrhinal input is assumed to reflect changes of spatial properties or directly carry spatial information 
(albeit in weak correlations, [Fyhn et al., 2004) ). Such spatial dependence of the postrhinal activity was 
approximated by first creating n Gaussian patches with each Gaussian having a maximum amplitude of 1: 



where p G denotes the coordinate vector of the rat, Ci e is the coordinate vector of the center of the 
i*'* Gaussian, and i G {1, . . . , 1000}. Centers were drawn from the uniform distribution over the full maze 
while (Ti were uniformly drawn from the range [20 cm, 40 cm]. 

Input X was created by using a random, binary mixing matrix G G jq j^jioooxiooo ^^^j. ^j^g q£ ^j^g 
Gaussians: 



Results III: Simulations 



(4.26) 




(4.27) 



x(t) = Gg(p(0), 
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where p{t) denotes the coordinates of the rat in the maze at time t and the i*'^ component of vector g S i?^""" 
is gi{p{t)) at time t. Each row of matrix G contains 20 positive non-zero elements on average. The resulting 
activity map for a single component of x(i) £ i.e., for one of our 'sensors' is shown in Fig.[2l[b). 

In simulation #1 the input to the model was exactly as defined in l|4.27p . 

In simulation #2 50 more units were added, so the dimension of the input, x(i) was 1050. The new units 
'sensed' directions and had no spatial dependence. The direction sensitivity has been defined as: 

(4.28) Xi = f,{(p) = max(0, cos((?!) - (pi)), 

for i G {1001, . . . 1050}, where </> denotes the direction between the last and the current positions and (pi 
denotes the direction for which the z*'* component (1000 < i < 1050) is the most sensitive. This particular 
choice results in broadly tuned ('^ 7''/2) directional activities. 

In simulation #3, instead of mixing the units that carry different information, we used 1000 conjunctive 
inputs that carried spatial and directional information: 

(4.29) ^dt)^Mm [Ggixit),y{t))l 

where 0(t) is the direction of the rat at time t. 

Last, in simulation #4, we used low-pass filtered versions of the inputs of simulation #3: 

(4.30) x(*=)(t + 1) = (1 -a)x(*'=)(t) +ax(i + 1) 

where superscript 'ic' stands for 'temporally convolved'. This is essentially the simplest autoregressive 
process regarding Eq. (|4.2p . 

Spatial analysis 

As opposed to real spiking data, linear transformations may give rise to negative signals. In turn, the 
correspondence between the unit activity values after each transformation and the neurons' responses is not 
straightforward. In order to generate the activity maps of the input units, first we discretized the space (the 
resolution was 30 so a bin is 6.67 cmx6.67 cm, which is comparable to jHafting et al., 2005a| ), and for each 
bin we summed up the activity measured in those steps that ended in the given bin. This spatial averaging 
smoothes out the artifacts caused by unattended spots. The activity after, e.g., decorrelation may assume 
negative values, so the data were half- wave rectified (clipped) and scaled to range [0, 1]. 

ICA is invariant for the change of sign jJutten and Herault, 1991| . In turn, the sign of an activity map 
has been defined by the average sign of the first 10 bins with the highest absolute value. That is, if more 
than 5 units were negative, we simply fiipped the sign of the map. The resulting maps were then half-wave 
rectified. We also computed the 2 dimensional normalized autocorrelation for each activity map. 

The spatial analysis of the peak activity regions for the autocorrelation image has been done 
by fitting a grid on the locally maximal points using Delaunay-triangulation [Markus et al., 1995} 
[Takacs and Lorincz, 2007| . Border vertices and nodes have been excluded from the analysis. Vertices are 
considered as internal if they belong to two triangles and nodes are internal if they only connect to other 
nodes through internal vertices. To characterize the regularity of the resulting grids, we calculated the ver- 
tex length and the angle distribution. Discretization, however, defines a lower bound of the edge length, 
which is about 2 bins, that is ~ 13.34 cm. Because the mean angle in Delaunay-triangulation is obviously 
60 degrees, the spread around this value (that is the standard deviation, or std for short) can be used to 
quantify regularity. The distribution of the mean vertex lengths and the distribution of the std of the angle 
values for the whole population have been used to compare the spatial characteristics of the input set and 
the set of the transformed signals. 

For simulations #2, #3, and #4, direction sensitivity has also been analyzed. To show the spatial 
distribution of the direction sensitivity, we discretized the activity maps into 10 x 10 bins and in each bin 
we collected those steps that ended in that bin. Their direction, weighted by the response value at the end 
point, was then added up. The resulting directed activity values can be visualized in a 'direction-field' plot. 
In order to characterize the spatial heterogeneity of the directional selectivity, the directed values may also 
be grouped according to their direction and these lumped sum values will be presented on a polar plot. 
These analysis serve to characterize the strength of spatial heterogeneity in direction selectivity. 
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Figure 3: Simulation #1: position dependent input, (a-d): each column shows the output of different 
decorrelating units (PC A). First row: half- wave rectified and scaled activity maps (0: black, 1: white). 
Second row: 2D autocorrelation function of the activity maps and the fitted grids. Third row: vertex angle 
histogram for the fitted grids, (e-f): cumulative statistics over all grids, (e): histogram of the mean edge 
length of the grids for the input set and the PC A units, (f): histogram of the standard deviation of the 
vertex angles for the input set and the PC A units, (g-i): sign corrected activity maps of three separating 
(ICA) units. Response is localized, (j): superimposed map of all ICA units demonstrating that the localized 
units cover the full maze. 



Simulation #1: direction- independent input 

Standard PCA reduced the dimension of the decorrelated input from 1000 to 852 as the remaining eigen- 
values were below the level of numerical precision. The first 127 dimensions carried 95% of the total variance 
and the first 203 carried 99%. The resulting activity maps are comparable to the firing rate maps shown in 
[Hafting et al., 2005a| . 

Subfigures [3] (a-d) show the spatial activity of 4 sample units. The first row depicts the clipped activity 
maps. We put the 2D autocorrelation functions of the activity maps and the superimposed grids into the 
second row. Note that the peak-to-peak distance varies over a broad range. The third row represents the 
vertex angle distribution of the corresponding grids. Narrower distribution means more uniform vertices and 
thus a more symmetric grid. Subfigures [Sje) and (f ) show cumulative statistics concerning the grids. Only 
the grids of the first 220 largest eigenvalues have been used in these analyses as the rest are mostly noise. 
Subfigure [SKe) compares the distribution of the mean vertex length of the fitted grids for the input activity 
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maps and the activity maps of the decorrelated inputs. Subfigure [3ljf) compares the distribution of the 
standard deviation of the vertex angles of the fitted grids. Again, for hexagonal-like grids, the smaller the 
standard deviation, the larger the regularity. While the experimental data available to us is not sufficient for 
comparisons, we can safely claim that these grids do cover a range similar to those of [Hafting et al., 2005a| . 
These diagrams unambiguously show that the 'gridness' of activity has increased significantly due to the 
decorrelation step. Since the creation of the input is essentially equal to a random mixture, the effect we 
show here is not an artifact. 

For the sake of completeness, the effect of separation is also shown. It was demonstrated in 
[Takacs and Lorincz, 2007| [Franzius et al., 2007| that inputs with grid-like spatial activity patterns can be 
transformed into more localized 'place cell'-Hke activity patterns by imposing independence or sparseness on 
the components of activity. As decorrelation in our simulations already yields grid activity, separation (into 
independent components) naturally resulted in unimodal place-cell like activity maps. Subfigures[3ljg-i) show 
three sample units and subfigure [Sl^j) depicts the superposition of activity maps of 60 independent units. 
The resulting coverage may be interpreted as coarse grain disretization of a low-dimensional space (in our 
particular case, the relevant dimension is 2). 

Simulation #2: Mixture of direction-independent and position-independent inputs 

After decorrelation most grid structures were almost identical to those of simulation #1 (Fig. [DJa-d)). 
However, many units also showed a certain degree of direction selectivity, too. Clearly this ensemble of units 
with different dependencies shows an apparent conjunctive representation of position and direction. Similar 
representation was found in dMEC III-V jSargolini et al., 2006| . However, separation after decorrelation 
unambiguously shows (Fig. [Dfe-g)) that there are now 2 relevant subspaces: direction and position. While 
most cells showed place-cell Hke activity (e.g. Fig. IDJe)), the rest of the units showed no spatial dependence: 
they were selective only for direction. For each subspace, separation essentially resulted in a coarse grain 
discretization. We emphasize that the decoupling of the directional information did not require any predictive 
mechanism in this case. It can be explained by the fact that a subgroup of the original inputs contained 
explicit directional information so their activity statistics are obviously different from the other units in the 
linear mixture. 

Simulation #3: Position and direction dependent inputs 

For true conjunctive inputs (i.e. all input units show both position and direction selectivity), two changes 
can be seen in the activity maps of the decorrelating units (Fig. [5])(a-d). First, all units inherited the 
conjunctive property showing some direction selectivity on top of the grid-like spacing. Second, regularity 
and symmetry properties degraded in both subspaces compared to those of Fig. Hlja-d). The output of the 
separating units (ICA), in contrast to Simulation #2, now all showed significant direction selectivity as well 
(Fig. [5]) (e-f ) . It implies that the ICA units are again local, but now in 3 dimensions: ICA cells basically 
discretize the Cartesian product of the 2 dimensional maze and the 1 dimensional space of directions. 

According to our hypothesis, the internal model reaHzed in the deep layers of the entorhinal cortex is 
responsible for restoring the decoupling between predictable (i.e., direction selective) and nonpredictable 
information. In turn, we expect to see weaker direction selectivity if the predictive model is also part of the 
computations (the circuitry now works on the difference (innovation) between the input and the expectation 
of the system's internal model, see Eq. I|4.16p ). If the internal model is correctly tuned, then directional 
sensitivity should disappear from the innovation, resulting in clear hexagonal spacing again. Furthermore, we 
should see a diminished direction selectivity in place cell activity as well. Indeed, the directional sensitivity of 
the innovation of the decorrelated inputs decreased considerably, although the hexagonal structure of Fig. [5] 
did not improve significantly (results are not shown). In addition, separation of the innovation yielded local 
activity with diminished direction selectivity as predicted (Fig. [6]) . 

Simulation #4: Temporally convolved position and direction dependent inputs 

As we noted earlier, theta oscillation may be responsible for time compression in the HR, which - from the 
computational point of view - corresponds to temporally convolved inputs. The resulting moving average 
would probably highhght those factors that change less abruptly. In our case, apart from the turns taken in 
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Figure 4: Simulation #2: mixture of purely position and purely direction selective inputs, (a-d): columns 
correspond to the output of different decorrelating units (PC A). First row: half- wave rectified and scaled 
activity maps. Second row: 2D autocorrelation function of the activity map and the fitted grid. Third row: 
spatial distribution of the direction selectivity is shown on a square grid of size 10x10. Fourth row: overall 
direction selectivity in the form of a polar plot, (e-g) : columns correspond to the output of different separating 
units (ICA). First row: sign fiipped, half- wave rectified activity maps. Second row: spatial distribution of 
the direction selectivity is shown on a square grid of size 10x10. Third row: overall direction selectivity in 
the form of a polar plot. 



(e) 



order to avoid collisions, directional information is either constant or varies slowly over a longer time interval. 
In turn, we expect to see that separation on the temporally convolved inputs collected during linear motion 
would yield stronger direction sensitivity with larger and less precise place fields (Fig. [Tj), similar to those 
found in the different areas of the subicular complex [Sharp, 1996| . Direction selectivity has a much finer 
scale than the half-rectified cosine dependence used for the inputs. 

18 




v^^i^^y 

(e) (f) ' IsT 

Figure 5: Simulation #3: position and direction selective inputs, (a-d): columns correspond to the output 
of different decorrelating units (PC A). First row: half-wave rectified and scaled activity maps. Second row: 
2D autocorrelation function of the activity map and the fitted grid. Third row: spatial distribution of the 
direction selectivity is shown on a square grid of size 10x10. Fourth row: overall direction selectivity in the 
form of a polar plot, (e-g): columns correspond to the output of different separating units (ICA). Fifth row: 
sign flipped, half-wave rectifled activity maps. Sixth row: spatial distribution of the direction selectivity is 
shown on a square grid of size 10x10. Seventh row: overall direction selectivity in the form of a polar plot. 



5. Discussion 

In this last section, we analyze the simulation results and re-evaluate the functional mapping of our 
computational model. These considerations then lead to some predictions about the HR. We conclude with 
a discussion of a few issues still left unresolved and identify possible further improvements. 
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Figure 6: Innovation on the decorrelated conjunctive inputs, (a-c): columns correspond to the output 
of different separating (ICA) units. First row: haff-wave rectified activity maps. Second row: spatial 
distribution of the direction selectivity is shown on a square grid of size 10x10. Third row: overall direction 
selectivity in the form of a polar plot. 
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Figure 7: Separation on temporally convolved, position and direction dependent inputs, (a-c): each column 
corresponds to different ICA output units. First row: sign fiipped and half-wave rectified activity maps. 
Second row: spatial distribution of the direction selectivity is shown on a square grid of size 10x10. Third 
row: overall direction selectivity in the form of a polar plot. 



Interpretation of the simulation results 

Although our model construct is based on general ideas about efficient representation of sensory events, 
when applied to spatially anchored inputs it has shown some intriguing properties that can directly corre- 
spond to experimental data. 

The model correspondences have already been supported by the first simulation in that grid-Hke activity 
has appeared in exactly those modules the neural substrates of which were reported to present this particular 
activity. Once grid- like activity is present, forcing independence results in localized activity, as was shown 
for example, in [Franzius et al., 2007| . What is more interesting, though, is that reciprocity (i.e., place cells 
are needed to get stable grid cells) can also be explained by the loopy structure of our model. Another 
observation is that the weak overlap among the resulting place fields can be considered as discretization of 
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the space. Similarly to what was found first in [Lorincz et al., 2001| , this is what ICA seems to do if there 
is a small dimensional space behind the high dimensional inputs. 

In Simulation #2, directional information was introduced by mixing output of position dependent and 
purely direction dependent units. Let us remark here that no such labeling as position or direction was given 
in the model. The proposed algorithms simply extract statistical properties of the input ensemble. This 
simulation yielded two interesting results. The first one is that grid cells now show conjunctive behavior 
as well: in addition to pure position dependence, many units appear to depend on both space and direc- 
tion. The other result that independent components now either show spatially locaHzed activity with no 
direction selectivity or demonstrate clear direction-selectivity without any particular position dependence is 
a consequence of our input preparation, because we mixed two signals with different statistical properties 
and ICA separated them. Now, ICA discretizes two distinct subspaces separately; the space of position and 
the directional space. This is not unlike how information fiow is supposed to take place between CAl and 
subiculum. It is, however, interesting that despite the significant temporal correlation carried by directional 
information, separation alone (i.e. without activation of the predictive system) was able to decouple these 
different pieces of information. These observations also show the robustness of the ICA algorithm. 

In Simulation #3, all input units had position and direction dependence (sampled from the product 
space of positional and directional information) . In this case decorrelation and extraction of the components 
yielded distorted, less regular grid activity and all independent components showed directional dependence 
as well. Now, separation discretized the 3 dimensional conjunctive space of position and direction. However, 
turning on the predictive system significantly lessened the directional selectivity of the EC II grid units, in 
accordance with the experimental findings. In turn, the extracted independent components of the EC II 
outputs also showed less directional dependence. 

In Simulation #4 we were interested in what happens if correlations at different time scales (directional 
information and temporal convolution) are present in the input. The most important result here is that 
convolution ('moving-average') actually facilitates the decoupling: narrowly tuned direction selective units 
appeared with large, less precisely defined place fields. This kind of activity is similar to what has been 
reported on the subicular complex. 

Consequences and predictions 

The results presented here have some direct consequences for the mapping. As decorrelation clearly 
yields grid-like activity and such activity has been found in the deep EC layers as well, arbitrariness in the 
choice of the CAl afferents of EC deep layer (matrix V) may be resolved. Decorrelation seems appropriate. 
Another consequence is that depending on the temporal structure of the input, after reaHzing these particular 
correlations separating transformations may efficiently channel the information. Although we omitted the 
modeling of the subicular complex, this observation may explain the existence of the distal/proximal loops 
between CAl, subiculum and EC |Gigg, 2006| . 

Before presenting our conjectures and predictions, let us recap the logic behind them. First we claimed 
that a memory system is efficient if the resulting representations (1) support a predictive internal model of 
sensory events, (2) can be interpreted in a probabilistic framework to cope with uncertainties and (3) can 
be factored to maintain the redundancy reduction principle, but also help reveal relevant subspaces. These 
high level functional motivations lead to a computational model that can explain the sensory input in terms 
of independent causes and can also predict the temporal changes of these causes and their interactions. 
The predictive faculty of the proposed structure is realized in an internal model that can take into account 
intrinsic (e.g. self-motion induced) and extrinsic changes in the observed signals. It is worth noting here 
that such distinctions are only meaningful if control of the intrinsic changes (for instance, changing the pace 
through appropriate motor commands) is possible. The required computational stages form a loop in which 
learning (tuning) and functioning are tightly coupled. The loopy structure implies that the HR connects the 
downstream and upstream information fiow between the efferent and afferent pathways. 

Next we attempted to map the proposed functional model onto the neuronal substrate by enumerating 
supporting anatomical, physiological and behavioral data. Due to complexity of the problem a series of 
simplifications had to be introduced. Our large scale functional model ignores fine temporal scales, thus 
(1) rate based coding of information is sufficient. We also reduce the difficulty by focusing only on (2) 
linear transformations - apart from the rectification of the neuronal outputs - although each stage can 
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also be extended to be nonlinear. We intend to provide (3) a network level description only, in which 
the transformations are carried out by similar computational units. These considerations together with 
the validating simulations, which were specifically aimed at studying spatial dependence, may lead to the 
following conjectures: 

(1) The core transformation of the circuitry may be seen as a realization of independent process analysis 
which provides a two stage solution to recover hidden components as well as the dynamics. 

(a) In one stage separation of independent (hidden) causes and their corresponding subspaces may 
take place. The HC plays a crucial role in mapping independent coordinates such as position 
and direction to different areas. Grouping of the components of the non-independent factors 
may occur by using the information about their 'non- independence', i.e., within the subspaces 
themselves. 

(b) In the other stage, a predictive system is implemented that can be fine tuned to fit the temporal 
scale of the evolution of the observed signals. Due to the interplay between these two functions, 
decorrelation and separation take place repeatedly along both the direct and the tri-synaptic 
routes. 

(2) Depending on the capacity of the available resources, separation can be seen as a means (1) of finding 
separable subspaces and (2) of discretizing these low dimensional subspaces. Position and direction, 
for instance, can be seen as two complementary but independent pieces of information. In turn, 
separation has a central role in shaping the responses of both the place and the head direction cells. 

(3) The predictive internal model is maintained by EC V/VI 

(4) The innovation term is formed in EC II through a complex interaction of at least 3 different areas 
projecting to the given layer. 

(5) The main input is held in EC III 

(6) The innovation term is the net result of the comparison of the expected input produced by the 
predictive system and the real input. Such comparison is made possible through the activation (by 
the EC III to EC II connections) of the widespread inhibitory network of EC II. 

(7) For both the innovation and the input, bottom-up and top-down connections work in concert to 
achieve decorrelation. 

(8) Actual separation is carried out in both the direct and the tri-synaptic pathways, resulting in inde- 
pendent activity in CAl. The two processes interact during learning as well as functioning. 

(9) Forcing independence may interfere with prediction, so some remixing is needed. Whitening seemed 
to be a natural choice, and this was supported by the comparison of the simulation results (grid 
activity emerges by decorrelation) with the experimental findings (grid activity can be found in all 
layers of dMEC). 

(10) The loopy structure and the whitening role of EC deep to EC superficial connections explains the 
fact that when the HC is removed signals of both superficial layers of EC change |Fyhn et al., 2004] . 



The resulting mapping is an improvement over the one proposed in Lorincz and Buzsaki, 2000 
[Lorincz et al., 2002] where decorrelation was assigned to CA3. This modification is necessary 
[Takacs and Lorincz, 2007| , because in applying decorrelation to spatially defined inputs grid like activ- 
ity emerges and such grids were found in the entorhinal cortex and not in the CA3. In our model the 
main role of the deep-to-superficial connections is whitening, whereas the comparator role of EC layer II 
Lorincz and Buzsaki, 2000} [Lorincz et al., 2002 has not been modified 



Regarding spatial information, simulations revealed that decoupHng of directional and positional infor- 
mation is viable in our model framework. If the neuronal mapping is correct, this decoupling defines the 
interplay between the hippocampus proper (responsible for shaping and maintaining primarily positional 
information) and the subicular complex (responsible for directional information). It may also explain the 
necessity of the two parallel routes. Because the proximal and distant targets differ in the two areas, it is 
possible that computations are similar at the CAl and at the subiculum. 
These considerations imply the following predictions: 

Subspace separation: At the initial stage of place field stabilization in CAl cells may show gradually 
diminishing direction sensitivity. If this conjecture is not supported by experimental findings, then 
place field formation cannot be explained by applying purely statistical considerations. 



Top-down influences: The key role of the deep layers of EC in extracting temporal dependencies (i.e. 
separating the predictable parts) implies that perturbation at these layers would result in a faulty 
prediction system and a weaker representation of directions in the subicular complex. In particular, 
changes in the activity of EC superficial layer neurons are expected. If such changes indeed exist, 
then the characteristic properties of these changes provide information about top-down influences 
on input filtering: modulation of the internal dynamical model may change the information that 
traverses to the CAl subfield. 

Distortions: In the model, parametric distortion of the grids ( [Barry et al., 2007| ) may only be demon- 
strated by providing information about the motor efferents or by providing access to control the 
processing of sensory information. 

Processing along the direct pathway is faster, as fewer transformations are involved. However, when 
temporal correlations are present, the resulting components may be distorted. In this case, the tri-synatic 
pathway is expected to become dominant, as can diminish these correlations. In sum, the varying influence 
of the two pathways may cause the temporary direction selectivity of the emerging place flelds. 

In our proposal, learning in the direct and tri-synaptic pathways takes place at different speeds. As 
independent sources should be developed on i.i.d. sources we would expect that the CAl responses are 
defined by the tri-synaptic pathway at least at the fine tuning stage of learning. According to the experiments 
[Sybirska et al., 2000| , each pathway can form stable place fields in the absence of the other. The processing 
along the direct pathway is probably faster jLeutgeb et al., 20041 and we think that this is due to lack of 
temporal decorrelation in this pathway. However, when temporal decorrelation is present, the tri-synaptic 
route may take the lead in tuning CAl. Both proximal and distant dendrites may play a role in learning the 
separation transformations, especially in the coordination of the ICA components. 

The second prediction emphasizes the fact that physical constraints of the animal's motion set the temporal 
scale of changes in direction. If the predictive internal model cannot correctly register this timescale, then 
extraction of the this kind of information will be impaired while recovery of positional information remains 
intact. 

The last prediction deserves some comments. As the main computations in our model are aimed at char- 
acterizing a set of inputs by extracting statistical information, any change in the underlying statistics would 
result in strong distortions of the emerging activity pattern (see Simulations # 3 and #4). Introducing 
control would affect the expectations of the internal model, which in turn would modify the predictions as 
well. If control information can be used as an internal metric, then it may help to recalibrate allocentric 
sensory information by modifying the expectations. This process, however, differs from relearning the un- 
derlying statistics. Although a surprisingly large number of properties can already be shown by our simple 
model without the context of any information on motor actions, it is known jBuzsaki, 2006| that maps in 
the hippocampus need motor actions and also dead reckoning builds on explorative trajectories with long 
and almost straight segments with intermittent random turns. 



Falsifying issues 

We consider the consequences of our predictions crucial in our model concept. The first issue is that 
our model relies heavily on the strong coupling between bottom-up and top-down information for both the 
whitening and the separation stages. In addition claim concerning basically identical transformation (the 
similarity between Wdir and Wtri) is quite restrictive. If this constraint is not experimentally supported then 
serious reconsideration seems necessary. 

The other issue regards the effect of goal-oriented behavior. Although we have seen that our model 
yields orthogonal hexagrid tiling, the resulting grids are not oriented. Oriented grids, however, may not be 
formed without additional constraints in our model. Based on the arguments concerning the differentiation 
of internal and external observation signals we believe that integration with control jSzita and Lorincz, 2004| 
over the observation process could yield the desired property. 
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Open issues 



While the model we have proposed successfully replicated the reported space-dependent activity at dif- 
ferent areas of the HR, several questions remain unanswered. First, we enumerate issues related either to 
the current stage of our model construction or to the particular form of the presented simulations. 

In our simulations we used locally defined inputs and did not model sensory associations between local 
and distal cues. Such binding is not trivial and remains a hot issue for example in computer vision. 

Although we showed that separation of relevant low-dimensional subspaces is possible, the mechanism of 
regrouping or fusion of the factors belonging to the same subspace is not yet known. We suppose that the 
particular cross talk between CAl and the subiculum |Gigg, 2006| may provide a clue. 

As regards prediction, even for the simplest case of the first order autoregressive process the training of 
the predictive matrix is quite involved, as the required innovation and signal terms are supposedly stored 
in different areas. In turn, queuing their arrival is very fundamental. At present it is not known what kind 
of network mechanism may set the timing. As was suggested in [Dragoi and Buzsaki, 2006| , one candidate 
would be the network level theta-oscillation that may gate information transfer to the deep layers of EC. In 
favor of this proposal, it is known that deep layer principal cells have distinctive theta modulation properties 
(see, e.g., jChrobak et al., 2000| and references therein) and LTP in the deep layers of the EC may be 
preferentially responsive to slow patterned activity |Yun et al., 2002| . 

In the following, we name a few important properties of the hippocampal region not yet integrated into 
the model. 

One relevant question concerns how the memory system can store information after one encounter ('one- 
shot' learning). This phenomena probably requires an additional mechanism not yet incorporated into our 
model since it is not based on statistical learning principles. Such a mechanism could be simple and Hebbian 
Kormendy-Racz et al., 1999 . 

Setting aside this prompt learning, consolidating the acquired knowledge usually takes more time. Pre- 
sumably sequential replay of previously formed activity patterns in CAS may facilitate this process. In fine 
with our initial assumptions we conjecture that forward replay may actually help shape the predictive system, 
while reverse replay is required to form better strategies for goal-oriented behavior [Sutton and Barto, 1998| . 
To define goals and behavior for our system, first a control mechanism should be integrated. Such a mech- 
anism would affect the sampling of the available inputs by changing the trajectory. In the simulations, we 
introduced one form of temporal convolution, but it is known that HR is able to represent sequences of 
spatiotemporal activity patterns in a temporally compressed form of varying timescales. Such highly versa- 
tile convolution makes decoding even harder. It was suggested [Lorincz and Buzsaki, 2000] that this task is 
assigned to the EC-DG-CA3 loop. A further improvement of our model would be to incorporate this loop 
as well. 
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